Skip to content

API for out-of-band scale-to-zero pin#237

Merged
sjmiller609 merged 10 commits into
mainfrom
hypeship/scaletozero-pin-api
May 12, 2026
Merged

API for out-of-band scale-to-zero pin#237
sjmiller609 merged 10 commits into
mainfrom
hypeship/scaletozero-pin-api

Conversation

@sjmiller609
Copy link
Copy Markdown
Contributor

@sjmiller609 sjmiller609 commented May 11, 2026

Summary

Adds two new endpoints to the kernel-images server:

  • POST /system/standby/disable — pins scale-to-zero off until released
  • POST /system/standby/enable — releases the pin

The pin lives alongside the existing request-driven middleware refcount inside DebouncedController:

  • scale-to-zero stays disabled while either request holders remain inflight or the pin is held
  • request-driven Enable (from middleware) does not release the pin
  • releasing the pin while requests are inflight defers the underlying enable until the last request completes
  • the configured re-enable cooldown applies on pin release exactly as it does on request release

DebouncedController and NoopController now also implement a new PinnedController sub-interface (Controller + DisablePin / EnablePin). The pin is a boolean — DisablePin / EnablePin are idempotent.

Why

This is the in-VM surface needed for a future control-plane integration: an external system (e.g. a hot-pool controller) needs to hold a VM out of standby while it sits idle in a pool, then release the hold when the VM is claimed.

The existing middleware refcount only works for inflight HTTP requests, so it can't hold a VM disabled across an idle period.

Notes for reviewers

  • All existing middleware-driven flows are byte-identical (the pin defaults to false; all current call sites take the same code paths).
  • The constructor return type for NewDebouncedController* widened from Controller to *DebouncedController so callers can access the pin methods. Only existing caller is cmd/api/main.go, which is unaffected since *DebouncedController still satisfies Controller for recorder.NewFFmpegRecorderFactory and scaletozero.Middleware.
  • Control-plane wiring (metro-api proxy + API server) is intentionally not in this PR.

Test plan

  • go test -race ./lib/scaletozero/... passes (6 new tests covering pin semantics)
  • go test -race ./cmd/api/... passes
  • go vet ./... clean
  • go build ./... clean
  • Manual review of openapi.yaml + regenerated lib/oapi/oapi.go

Note

Medium Risk
Introduces new API endpoints and modifies DebouncedController state machine to include a pinned hold, which could affect VM standby/idle behavior and timing if the pin logic misbehaves.

Overview
Adds control-plane style endpoints POST /scaletozero/disable and POST /scaletozero/enable that pin scale-to-zero off/on independently of the existing per-request middleware refcount.

Extends scaletozero with a new PinnedController interface and updates DebouncedController to track a boolean pinned hold that blocks request-driven re-enables until explicitly unpinned (still honoring the existing cooldown behavior).

Regenerates openapi.yaml/lib/oapi for the new routes and adds unit + e2e coverage to validate idempotency and that the pin does not interfere with normal request handling.

Reviewed by Cursor Bugbot for commit e15d112. Bugbot is set up for automated code reviews on this repo. Configure here.

sjmiller609 and others added 9 commits May 8, 2026 22:48
Adds two new endpoints to the kernel-images server:
- POST /system/standby/disable — pins scale-to-zero off until released
- POST /system/standby/enable  — releases the pin

The pin lives alongside the existing request-driven middleware refcount in
DebouncedController: scale-to-zero stays disabled while either holders are
inflight requests OR the pin is held. Request-driven Enable calls do not
release the pin, so a pinned VM survives idle periods. Releasing the pin
honors any configured re-enable cooldown.

This is the in-VM surface for future control-plane integrations (e.g. a
hot-pool controller reserving a VM until it is claimed). Control-plane
wiring will follow in metro-api and the API server.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Spins up the headless image via testcontainers and exercises:
- Idempotent disable (two consecutive 204s)
- A normal request flows while pinned (middleware coexistence)
- Idempotent enable (two consecutive 204s)

The unikraft control file does not exist inside the docker test
container, so the underlying scale-to-zero write is a no-op. The
test validates HTTP wiring and handler/middleware coexistence; the
deep pin semantics are covered by unit tests against DebouncedController.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address review feedback:
- Path /system/standby/* implied VM-state mutation; rename to
  /scaletozero/{pin,unpin} so the operation is specific to the
  scale-to-zero gate.
- Interface methods DisablePin/EnablePin read as inverted; rename to
  Pin/Unpin for clarity.
- Rewrite openapi summary/description to be caller-focused (what it
  does, when to call, what pairs with).
Match user-facing terminology to the action ("disable scale to zero")
rather than the internal pin mechanism. Internal PinnedController.Pin/Unpin
methods retain pin/unpin naming since they're distinct from the refcounted
Controller.Disable/Enable.
Rename refcounted hold methods to Acquire/Release so that Disable/Enable
can carry the idempotent persistent-toggle semantics defined by the
/scaletozero/{disable,enable} API. Split the low-level direct toggle out
into a separate Toggler interface (unikraftCloudToggler) wrapped by
DebouncedController.
PinnedController's Pin/Unpin are the in-scope additions for the
/scaletozero/{disable,enable} endpoints. Rename them to DisableStz/EnableStz
so the verb matches the API. The pre-existing refcounted Controller.Disable
/Enable is left untouched, since DisableStz/EnableStz avoids a method-name
collision on the concrete DebouncedController.
Previously, Unpin set c.pinned = false before calling maybeReenableLocked.
If the underlying ctrl.Enable returned an error (no-cooldown path), the pin
was already released but c.disabled remained true, so a retry of Unpin hit
the !c.pinned early-return and became a no-op. The controller was stuck
disabled with no API-driven recovery path. Restore c.pinned on error so the
caller can retry, mirroring Pin's "flip the state flag only after the side
effect succeeded" pattern.
@sjmiller609 sjmiller609 marked this pull request as ready for review May 11, 2026 19:02
@firetiger-agent
Copy link
Copy Markdown

Monitoring Plan: /system/standby API for Out-of-Band Scale-to-Zero Pin

This PR adds two new HTTP endpoints (POST /system/standby/disable and POST /system/standby/enable) to the kernel-images browser VM server, enabling explicit lifecycle control of Unikraft/Hypeman scale-to-zero independently of the per-request middleware refcount. The core change introduces a PinnedController interface and a pinned boolean in DebouncedController that blocks re-enable calls from the HTTP middleware while the pin is held. DebouncedController's maybeReenableLocked helper is now shared between Enable and EnablePin. The change is well-tested with 6 new unit tests and a new e2e test.

Key risks to watch: (1) mutex contention or a logic error in the refactored maybeReenableLocked helper causing pool drain/fill regressions — watch for drops in kernel_browser_pool_pop_total or spikes in kernel_browser_pool_empty_pop_total vs. baseline (~50K–90K metric data points/hr pop rate, ~420–1,140/hr empty-pop count); (2) any ERROR logs mentioning "failed to disable standby" or "failed to release scale-to-zero pin" (expected baseline: 0); (3) pin state leaks if callers don't pair disable/enable. Blast radius is limited to individual VM processes — a bug would not affect the central API service.

Status updates will be posted automatically on this PR as monitoring progresses.

View agent

@sjmiller609 sjmiller609 requested review from Sayan- and hiroTamada and removed request for hiroTamada May 11, 2026 19:17
Comment thread server/openapi.yaml Outdated
Comment thread server/openapi.yaml Outdated
Comment thread server/lib/scaletozero/scaletozero.go Outdated
Comment thread server/openapi.yaml Outdated
Comment thread server/cmd/api/api/api.go
@sjmiller609 sjmiller609 requested a review from Sayan- May 11, 2026 23:00
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8361537. Configure here.

Comment thread server/lib/scaletozero/scaletozero.go Outdated
@Sayan- Sayan- changed the title Add /system/standby API for out-of-band scale-to-zero pin API for out-of-band scale-to-zero pin May 12, 2026
Copy link
Copy Markdown
Contributor

@Sayan- Sayan- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeet - thanks for iterating!

@sjmiller609 sjmiller609 merged commit 15d9049 into main May 12, 2026
12 of 13 checks passed
@sjmiller609 sjmiller609 deleted the hypeship/scaletozero-pin-api branch May 12, 2026 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants